


This Dataverse contains the replication material for the paper: 
"Flexible Estimation of Policy Preferences for Witnesses in Committee Hearings" by Kevin Esterling and Ju Yeon Park

* Repository date:
April 1, 2024

* Software versions:
Windows 11 Platform: x86_64-w64-mingw32/x64 (64-bit)
MultiBUGS version 2.0 or Stan 2.32.5 or JAGS 4.3.1
RStudio 2023.06.0 Build 421; R version 4.3.1 (2023-06-16 ucrt) -- "Beagle Scouts"
R packages (version): "devtools" (2.4.5), "PBSmodelling" (2.69.3), "foreign" (0.8-84), "arm" (1.13-1), "readstata13" (0.10.1), "coda" (0.19-4.1), "grid" (4.3.1), "tm" (0.7-11), "SnowballC" (0.7.1), "wordcloud" (2.6), "RColorBrewer" (1.1-3), "plotrix" (3.8-2), "ggplot2" (3.4.4), "ggridges" (0.5.5), "ggpubr" (0.6.0), "viridis" (0.6.4), "pscl" (1.5.9), "R2MultiBUGS" (0.9), rstan (2.32.5), R2jags (0.7-1.1)

* Hardware:
Processor: Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz   3.60 GHz
RAM: 16.0 GB

* Software installations:
You must install either MultiBUGS (Windows only), or RStan, or JAGS to run this repository. For MultiBUGS installation instructions, visit: https://www.multibugs.org/download/. As you will see, MultiBUGS requires you to first install Microsoft MPI. You only need to install the .exe version of MS-MPI, and you need to have write priviledges in the directory where MultiBUGS is installed. Usually this means installing MultiBUGS off of your /users directory. For Stan/RStan installation instructions, visit: https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started and for JAGS installation please see https://sourceforge.net/projects/mcmc-jags/files/JAGS/ . Stan and JAGS both work with Windows, MacOS and Linux. The Run_Repository.R module automatcally installs any needed R modules and loads them, other than rstan which you have to install by following the installation instructions. You also can choose to run the repository manually using the ManualMode.R script, which also shows how to do the R installations. We cannot automate MultiBUGS, JAGS or Stan installation since that depends on local aspects of your machine. Each MCMC platform provides an example model and data to test your installation. Please be sure to test your MCMC platform before trying to run the repository to ensure you don't encounter a local configuration problem that is unrelated to the repository. In our experience, the Windows version of each MCMC platform worked on the first try, but we encountered configuration problems for Mac (for JAGS) and Linux (for both JAGS and Stan) that had to be resolved first.

* Model runtimes:
The repository is composed of 15 models. When using MultiBUGS, the simulations require ~1.5, rollcall analysis is ~1 min, the IRT models ~0.5 hr the application models ~2.5 hr, and to run all models is ~5.5 hr. When using Stan, the simulations require ~12 minutes, rollcall analysis is ~1 min, the IRT models  ~6 minutes, ~1.25, and to run all models is ~1.5 hours. Using JAGS, running the full repository takes about 5 hours. Stan and JAGS estimates models faster, but also we have set MultiBUGS purposefully to sample for a long period to ensure complete mixing of the chains. This is because the MultiBUGS implementation is the official version reported in the paper. 

* Data availability statement:
This replication package is available at the Political Analysis Dataverse. https://dataverse.harvard.edu/dataverse/pan

* Contact information
For questions about this repository, please email Kevin Esterling at kevin.esterling@ucr.edu



***** Introduction

This repository runs the reproduction for the article 'Flexible Estimation of Policy Preferences for Witnesses in Committee Hearings' (Political Analysis 2024). Please download the repository Replication_flex.zip, unpack it on your local machine, and then read the README_FIRST.txt file (which is this file).

This module requires you to install (at least) one of three MCMC estimation platforms: MultiBUGS, Stan, or JAGS. Irrespective of which sampler you choose, there are two ways to run this repository: either 1) a menu-driven automated implementation using the Run_Repository.R module located in the project root directory, or 2) manually using the ManualMode.R module which is also located in the root directory. The easiest method is to use the Run_Repository.R module and that is the appropriate method to quickly see the reproduced results. If you wish to experiment with the models such as changing the code or the data, or trying out different seeds, or if you just want to run a single model, you will need to use the manual approach. In this file, we first describe the Run_Repository.R module and then we describe the ManualMode.R module.

Either way, when you run the repository, the reproduced tables and figures are created in the repository directory called "7 - My Results", with one folder for tables and the other for figures. Note that if you run the respository multiple times, before doing so please delete the figures in this folder. It seems that sometimes R won't always overwrite a PNG that already exists (we do not know why that happens and indeed it is possible it is an issue with Dropbox).

In Section A we provide instructions for running the repository in Stan/RStan; in Section B we provide instructions for running the repository using JAGS; in Section C we provide instructions for running the repository using R2MultiBUGS. In section D we provide instructions on how to run the repository manually. Section E describes the author-saved model and results files that are included to show our results.



****Section A: How to use the Run_Repository.R module using Stan/RStan:

The results reported in the paper are from MultiBUGS, but we provide an implementation of the repository in Stan/RStan for users' convenience. It is essential to note, however, that the results from Stan differ slightly from those using MultiBUGS, and that you must use MultiBUGS to do an exact reproduction. However, the results from Stan are essentially identical and produce substantively similar results.

VERY IMPORTANT NOTE FOR LINUX USERS: It turns out that for some versions of Linux, the local C++ compiler is a different version from the one that the official version of Stan anticipates. On some Linux machines, when this happens Stan will throw an error message that indicates a compiling error, but on other machines it will look like there is an error in the repository itself.  Running the repository should result in no errors. If you are using the official version of Stan and encounter an error, please uninstall and reinstall Stan using these lines... 

remove.packages(c("StanHeaders", "rstan"))
install.packages("StanHeaders", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
install.packages("rstan", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))

... and hopefully that will correct the problem. If this does not correct the problem, please switch to JAGS using the instructions below. Stan worked for us with no issue for Windows and Mac.

Before you run the Run_Repository.R master script, be certain to set the working directory to the project's root directory in your IDE. Failure to do this will break all of the path commands. 

To run this module, you need to gather some information in advance:

1. You must be certain that you have installed the R package rstan locally, which also installs Stan. For installation instructions, visit: https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started . As you will see, installing RStan requires you to have already installed RTools. Please follow the installation instructions exactly, and please test out your local configuration using the supplied example code and data before trying to run this repository. Stan does not reliably or always work off the shelf.

2. Decide in advance if you want to run the repository in test mode or estimation mode. Test mode runs the repository very quickly but does not estimate a correct posterior. Test mode is only used to check that the repository works on your machine. Please use test mode your first time running the repository to check if there are any local dependencies missing or other issues before running estimation mode. The repository should run without any errors.

3. When you run the Run_Repository.R module, you will be given a menu of the different models you can run. These choices are:
    Press 1 for Simulation (12 min) 
    Press 2 for Rollcall Analysis (1 min) 
    Press 3 for IRT models (6 min)  
    Press 4 for Application (1.25 hours) 
    Press 5 for Run all (1.5 hours) 
    To Exit, press 6 
The runtimes in parentheses are approximate and are the times required for estimation mode using Stan. These five options follow the sequence of the article Appendix. Please see the Appendix for details.

6. While the paper has a total of 15 models, note that only two are needed for the core results that are shown in Figures 1-3 and appendix Table 4, which compares the constrained-regression model results to the flexible model results. The rest of the models are for proof of identification, demonstration of the model properties, or are robustness checks. To run only the two core models, choose option 4 (Application) from the main menu, and then you will be given the option to choose to only run the core models or to run all five of the application models.

7. When you run the Run_Repository.R module in estimation mode, the reproduced results will populate in the project subdirectory "7 My Results." This directory has one folder for the reproduced tables 3 and 4 and the healthcare rollcall analysis, and another folder for the reproduced figures 1-14. Summaries of the full posteriors for each model are saved in each model subdirectory in a file named "MyStats.txt". Note that every time you run the repository in estimation mode, each of the MyStats.txt and each of the results in the My Results subdirectory will be overwritten. 

8. When you are ready to run the repository, gather the information above, then inside of your R IDE, navigate to the root directory, and then open and run the Run_Repository.R file, which only contains a single line of code.  Note: if you have already run the repository, please delete the figures from the folder in "7 - My Results" before re-running. 

DIFFERENCES BETWEEN STAN AND MULTIBUGS: Stan currently has a few limitations that make it less ideal for this application compared to MultiBUGS. First, Stan cannot accommodate discrete missing data, although this does not have a major impact in our application given the very low rates of missing data. To accommodate this limitation, in the application module we drop one dyad with missing counts in the outcome model, and we imputed missing ideology survey item responses with ideologically consistent values which imputes a total of 9 values (1.25% of all possible observations) for 6 different witnesses (no imputation needed for FMCs). By "ideologically consistent" we mean that, since every witness provided some responses to the ideology battery, we could observe if the non-missing responses were either liberal or conservative. To impute the missing responses, if the witness provided liberal-leaning responses we imputed a "2" (or a "4" on the markets question), and vice versa if they provided conservative-leaning responses. This is not ideal but it is a solution to enable Stan to work for the repository. Second, Stan does not have a "cut" function similar to that in MultiBUGS. The cut function allows information to flow into one part of the model for updating, but not to flow back to the other part of the model. In the MultiBUGS implementation, we use the cut function to directly implement the constrained-regression model so the bridging to witness preferences occurs dynamically in the model. For Stan, we instead needed to create a predicted witness preferences to read in as data. This does not affect the estimation, but it necessarily makes the scales for the witness preferences differ across the two implementations and hence the point estimates and standard errors differ by a scale factor. Third, in the MultiBUGS version we impose a soft constraint to prevent the witness preference dimension from rotating past the orthogonal direction. When we impose this soft constraint in Stan, Stan reports error warnings, and hence we do not use it. As we mention in the paper, the results are the same in all regards when using or not using the soft constraint, but the difference is that figure 2 in the Stan reproduction will show a rotation that is essentially orthogonal but slightly obtuse, which is hard to interpret.

** Note, if you do not intend to implement the models using MultiBUGS, JAGS or manually, you can stop reading here. If you wish to learn more about the repository and how to run the models in MultiBUGS using R2MultiBUGS or manually, please keep reading.



****Section B: How to use the Run_Repository.R module using JAGS:

The results reported in the paper are from MultiBUGS, but we provide an implementation of the repository in JAGS for users' convenience. It is essential to note, however, that the results from JAGS differ slightly from those using MultiBUGS, and that you must use MultiBUGS to do an exact reproduction. However, the results from JAGS are essentially identical and produce substantively similar results.

Before you run the Run_Repository.R master script, be certain to set the working directory to the project's root directory in your IDE. Failure to do this will break all of the path commands. 

To run this module, you need to gather some information in advance:

1. You must be certain that you have installed the JAGS locally. For installation instructions, visit: https://sourceforge.net/projects/mcmc-jags/files/JAGS/ . Please follow the installation instructions exactly. For us, the built-in Linux installer using the following command: "apt-get install jags" for Debian-based Linux distributions (Ubantu) worked well but this will not work with all Linux distributions. And note that Mac users must take care to install the JAGS version that matches their OS. Please test out your local configuration using their example code and data before trying to run this repository.

2. Decide in advance if you want to run the repository in test mode or estimation mode. Test mode runs the repository very quickly but does not estimate a correct posterior. Test mode is only used to check that the module works on your machine. Please use test mode your first time running the repository to check if there are any local dependencies missing or other issues before running estimation mode.

3. When you run the Run_Repository.R module, you will be given a menu of the different models you can run. These choices are:
    Press 1 for Simulation (1.5 hours) 
    Press 2 for Rollcall Analysis (1 min) 
    Press 3 for IRT models (15 min)  
    Press 4 for Application (2.5 hours) 
    Press 5 for Run all (4.5 hours) 
    To Exit, press 6 
The runtimes in parentheses are approximate and are the times required for estimation mode using JAGS. These five options follow the sequence of the article Appendix. Please see the Appendix for details.

6. While the paper has a total of 15 models, note that only two are needed for the core results that are shown in Figures 1-3 and appendix Table 4, which compares the constrained-regression model results to the flexible model results. The rest of the models are for proof of identification, demonstration of the model properties, or are robustness checks. To run only the two core models, choose option 4 (Application) from the main menu, and then you will be given the option to choose to only run the core models or to run all five of the application models.

7. When you run the Run_Repository.R module in estimation mode, the reproduced results will populate in the project subdirectory "7 My Results." This directory has one folder for the reproduced tables 3 and 4 and the healthcare rollcall analysis, and another folder for the reproduced figures 1-14. Summaries of the full posteriors for each model are saved in each model subdirectory in a file named "MyStats.txt". Note that every time you run the repository in estimation mode, each of the MyStats.txt and each of the results in the My Results subdirectory will be overwritten. 

8. When you are ready to run the repository, gather the information above, then inside of your R IDE, navigate to the root directory, and then open and run the Run_Repository.R file, which only contains a single line of code.  Note: if you have already run the repository, please delete the figures from the folder in "7 - My Results" before re-running.

** Note, if you do not intend to implement the models using MultiBUGS or manually, you can stop reading here. If you wish to learn more about the repository and how to run the models in MultiBUGS using R2MultiBUGS or manually, please keep reading.



****Section C: How to use the Run_Repository.R module using MultiBUGS:

Important note: Running the repository using MultiBUGS currently requires Windows. The reason for this restriction is the R package we use to implement the repository, R2MultiBUGS, currently only has a Windows version available. 

Because of the time it takes to estimate the models, be sure to go into Windows settings and turn off sleep mode (and then turn it back on when done!). 

Before you run the Run_Repository.R master script, be certain to set the working directory to the project's root directory. Failure to do this will break all of the path commands. 

To run this module, you need to gather some information in advance:

1. You must be certain that you have installed MultiBUGS locally. For installation instructions, visit: https://www.multibugs.org/download/ . MultiBUGS requires that you have write access to the location where it saves temporary files, which they explain on the installation page. Because you need write access to the directory where MultiBUGS.exe is installed, the safest bet is to create C:/Users/User Name/MultiBUGS-2.0/ and install there.

2. You must be able to type in the full path to your MultiBUGS installation. In Windows, you can right click the MultiBUGS.exe file and copy the path to the clipboard. When you run the module, you will be prompted to type in the path. Be sure you include only the path using forward slashes; do not include quotes, do not place any spaces at the end, and do not include the .exe filename -- just type in the path. 

3. Decide in advance if you want to run the repository in test mode or estimation mode. Test mode runs the repository in about 7 minutes but does not estimate a correct posterior. Test mode is only used to check that the module works on your machine. Please use test mode your first time running the repository to check if there are any local dependencies missing or other issues before running estimation mode.

4. In estimation mode, you will be prompted to choose whether to run the repository in debug mode or not. Debug mode freezes MultiBUGS after each model has completed its iterations, allowing you to check the model for convergence or to save any output, and then once you close MultiBUGS that will send the results back into R. If you do not use debug mode, MultiBUGS will automatically close after the iterations have completed. Debug mode is not recommended if you run the full repository.

5. When you run the Run_Repository.R module, you will be given a menu of the different models you can run. These choices are:
    Press 1 for Simulation (~1.5 hr) 
    Press 2 for Rollcall Analysis (~2 min) 
    Press 3 for IRT models (~0.5 hr)  
    Press 4 for Application (~2.5 hr) 
    Press 5 for Run all (~5.5 hr) 
    To Exit, press 6 
The runtimes in parentheses are approximate and are the times required for estimation mode. These five options follow the sequence of the article Appendix. Please see the Appendix for details.

6. While the paper has a total of 15 models, note that only two are needed for the core results that are shown in Figures 1-3 and appendix Table 4, which compares the constrained model results to the flexible model results. The rest of the models are for proof of identification, demonstration of the model properties, or are robustness checks. To run only the two core models, choose option 4 (Application) from the main menu, and then you will be given the option to choose to only run the core models or to run all five of the application models.

7. When you run the Run_Repository.R module in estimation mode, the reproduced results will populate in the project subdirectory "7 My Results." This directory has one folder for the reproduced tables 3 and 4, and another folder for the reproduced figures 1-14. Summaries of the full posteriors for each model are saved in each model subdirectory in a file named "MyStats.txt". Note that every time you run the repository in estimation mode, each of the MyStats.txt and each of the results in the My Results subdirectory will be overwritten. 

8. When you are ready to run the repository, gather the information above, then inside of your R IDE, navigate to the root directory, and then open and run the Run_Repository.R file, which only contains a single line of code.  Note: if you have already run the repository, please delete the figures from the folder in "7 - My Results" before re-running.

** Note, if you do not intend to manually implement the models, you can stop reading here. If you wish to learn more about the repository and how to run the models manually, please keep reading.




****Section D: Manual Method:

The repository is organized by subdirectory. Most subdirectories contain the files required to replicate one specific model or analysis in the paper -- such as a column in a table or a figure.  In addition, there is one subdirectory that has the raw data and codebooks, one that has the code to replicate the figures, and one that contains the original documentation for the project including surveys and the coding rules. The subdirectories are numbered and organized in the sequence that each analysis is found in the appendix. Please see the Appendix as a guide. The project subdirectory structure is as follows:

0 Raw Data
1 Simulation (replication for the models in Figure 5 in Appendix A.1)
2 Heatlhcare Roll Call Analysis (replication for Appendix A.3)
3 Common Space IRT (replication for Table 3 and Figure 7 in Appendix A.5)
4 Application (replication for Table 4 and Figures 1-3, the additional power analysis for the application, and the Bonica CF replication in Appendix A.5)
5 Code to Reproduce Figures (which uses our stored results -- to create figures using your reproduced results you will need to use Run_Repository.R or ManualMode.R)
6 Original Surveys and Coding Documents
7 My Results

To run the reproduction for a given model, begin in RStudio, and navigate to the project root directory.  From there, run the R file in the project root directory "ManualMode.R". The code in that file will create an object called "sourceDir" that stores the path of your root directory, which is necessary to run this repository. Be certain to install and test your MCMC sampler following the instructions above, and then ManualMode.R also will handle any R package installations and load the libraries needed for the MCMC sampler you are using. Once you have run the ManualMode.R module, use the RStudio GUI menu to set the working directory to the subdirectory that contains the analysis you want to conduct, then open the "Run_Model" file corresponding to your MCMC sampler. For Stan, the file to run the subdirectory model is called Run_StanModel.R. For JAGS it is Run_JAGSModel.R. For MultiBUGS is it Run_Model.R. Once you open the R script, you can press the "source" button in RStudio and that will run the model.  If you choose to run the script line-by-line, in each case be certain to always run the last few lines of the script that contain any detach commands and rm() commands.  

Note: if you have already run the repository, please delete the figures from the folder in "7 - My Results" before re-running.

In MultiBUGS, we choose the options to not distribute over nodes and to fix founders. For each model we report in the paper, we used three chains with a 100k burnin period and then sampled 300k per chain, saving every 300 draws for a total posterior sample of 3k.  




****Section E: Author-created results:

Each of the subdirectories contain the original model, data, inits and results files that we produced using MultiBUGS and that have the results published in the paper. You can use these directly in MultiBUGS if you prefer doing that to automating the analysis via R. Each directory has files that use these naming conventions: 1) the model file has "model" in the filename, 2) the data file has "data" in the filename, 3) the initial values files have "inits" in the filename, 4) the saved states, which let you restart the sampler in the stationary distribution, have "state" in the filename, 5) the coda saved sample and index files have "coda" in the filename, 6) the summaries of the posterior are saved in the file named stats.txt, and 7) the traceplots are in the file named history.odc (which you can open inside of MultiBUGS to view). Finally, if you wish to recreate the figures, please use the figure code files in the "5 Code to Reproduce Figures" which will use our saved results to generate the figures.



****Section F: Contact info

Questions or comments about the repository, please email kevin.esterling@ucr.edu



